Automatic Clustering of Excited-State Trajectories: Application to Photoexcited Dynamics

We introduce automatic clustering as a computationally efficient tool for classifying and interpreting trajectories from simulations of photo-excited dynamics. Trajectories are treated as time-series data, with the features for clustering selected by variance mapping of normalized data. The L2-norm and dynamic time warping are proposed as suitable similarity measures for calculating the distance matrices, and these are clustered using the unsupervised density-based DBSCAN algorithm. The silhouette coefficient and the number of trajectories classified as noise are used as quality measures for the clustering. The ability of clustering to provide rapid overview of large and complex trajectory data sets, and its utility for extracting chemical and physical insight, is demonstrated on trajectories corresponding to the photochemical ring-opening reaction of 1,3-cyclohexadiene, noting that the clustering can be used to generate reduced dimensionality representations in an unbiased manner.


INTRODUCTION
Photoexcited dynamics is complex, with the nuclear wave packet evolving across coupled multidimensional potential energy surfaces (PESs) on ultrashort timescales.Experimental techniques such as time-resolved spectroscopy, scattering, or Coulomb explosion imaging 1−5 can offer invaluable insights, especially when combined with theory and simulations.Quite often, the key contributions to the experimental observable are attributed with the help of trajectory-based simulations using on-the-fly quantum dynamics or mixed quantum-classical methods such as surface hopping, 6 ab initio multiple spawning, 7 multiconfigurational Ehrenfest, 8,9 and direct dynamics variational multiconfigurational Gaussian. 10The counterpoint to the increasing sophistication and accuracy of the simulations is the challenge of finding simple interpretations, which is not straightforward given that the trajectories encompass the inherent multidimensional complexity of the dynamics.In this context, an unbiased and objective method for arriving at an understanding of the key features of the dynamics, ideally also in terms of a reduced dimensionality model, would have great utility.
In recent years, computational chemistry has seen an increase in unsupervised and supervised machine learning algorithms. 11Moreover, unsupervised clustering algorithms that aim to identify key patterns and predict future observations have seen broad success in a wide range of fields.Applications to time-dependent data include the analysis of global positioning system data, 12 the tracking of patterns in animal migration, 13 the categorization of medical time-series data, 14 and the deduction of trends in financial data and associated forecasting. 15In light of this, we explore the application of clustering algorithms capable of automatically categorizing the key aspects of trajectory data, providing a better overview of the multidimensional dynamics.This allows for the identification and classification of the decay or reaction pathways available to an excited system, with clusters of similar trajectories representing concentrations of reaction flux.−21 The complexity and computational cost of calculating accurate observables from simulations explains the persistence of simple approximations such as the independent atom model 22,23 or Dyson norm calculations 24 for scattering and photoelectron spectroscopy, respectively.Highly accurate calculations of total scattering signals from ab initio wavefunctions 25−28 or photoelectron spectra using sophisticated Rmatrix codes 29 are computationally expensive.Clustering allows a reduced dimensionality model of the dynamics to be constructed using representative trajectories from each cluster.This model, in turn, can be used to calculate observables. 30hus, we anticipate that effective clustering will enable more accurate calculations of observables.
While clustering algorithms have been used extensively to cluster molecular geometries taken from (ground electronic state) classical molecular dynamics simulations and have seen success in classifying the microstates sampled by the trajectory, e.g., in terms of protein conformations, 31−35 the problem of clustering trajectories from excited-state simulations poses unique challenges.First of all, the task is not to cluster individual molecular geometries but rather continuous temporal sequences of geometries.The density and spatial distribution of the trajectories can vary significantly, and since the trajectories propagate across different electronic states and across different parts of the topography of each PES, their characteristic motions and frequencies can vary considerably.Furthermore, as copropagating trajectories transition between different electronic states, either by internal conversion or intersystem crossing, they can diverge, reflecting the branching of the nuclear wave packet.Finally, in contrast to standard force-field-based molecular dynamics simulations, the chemical structure of molecules during photoexcited processes often changes as bonds are made or broken.All these aspects make clustering in the context of photochemical reactions an interesting, challenging, and important problem.
The rest of this paper is structured as follows.In Section 2, we introduce trajectory data that models the electrocyclic ringopening dynamics of 1,3-cyclohexadiene (CHD), followed by a discussion of the data preprocessing and clustering procedure.In Section 3, we discuss the results of applying clustering algorithms to the trajectory data and the limitations encountered.Finally, in Section 4, we present the conclusions and future avenues for further exploration.

METHODS
Generally, a clustering algorithm aims to categorize a large dataset into distinct clusters, such that the intracluster variance is minimized and the intercluster variance is maximized. 36The application of clustering algorithms to time-dependent data to identify temporal patterns and characteristics is inherently more challenging than time-independent applications.This is a consequence of the fact that data points are no longer independent and that correlation between points in the same temporal sequence must be considered.In addition, different sequences may have different lengths, with local variations in timing or temporal alignment, and typically have high dimensionality.

Model Trajectories.
To test the application of clustering algorithms to photoexcited trajectory data, we utilize an ensemble of 100 semi-classical ab initio multiconfigurational Ehrenfest (AIMCE) trajectories that model the ring-opening reaction of CHD to 1,3,5-hexatriene (HT), with the electronic structure calculated at the SA3-CASSCF(6,4)/cc-pVDZ level, a brief overview of the AIMCE method is given in the Supporting Information.These trajectories are taken from ref 37.The overall physical situation is illustrated in Figure 1.Each trajectory evolves in time across the 36 internal geometry coordinates and the populations of the three electronic states considered in the dynamics.Figure 2 shows the populations of the three adiabatic singlet states included over the 150 fs simulation period.
−64 Following excitation to the S 1 (1B) electronic state, the wave packet rapidly accelerates out of the Franck−Condon region, with the conrotatory twist of the terminal carbon atoms already taking place before the 2A/1B conical intersection (CoIn) is reached. 46,49−48 The acceleration from the slope of the cusp region results in rapid decay to the ground S 0 state, forming either HT (initially cZc-HT) or returning to vibrationally hot CHD.
In Figure 2, we observe the onset in S 0 population rise at 25 fs.The S 0 population continues to rise as the S 1 and S 2 states stabilize at around 70 fs, by which time a large portion of the wave packet has reached the S 0 ground state.Since the formation of cZc-HT occurs with an energy in excess of the barrier to isomerization on the ground state, the full range of cZc-, cZt-, and tZt-HT isomers appear.At times greater than  70 fs, the AIMCE simulations as shown in Figure 2 show that the populations stabilize with significant S 1 population.At these intermediate times, the S 1 state is of 2A character, and the remainder of the wave packet undergoes several more oscillations along the near flat 2A surface, causing it to disperse.This results in a broader distribution of geometries that have a higher C 1 −C 6 torsion angle, which eventually decay to the ground state, potentially resulting in a small fraction of the more open E-HT isomers. 50The majority of experimental and theoretical studies estimate the total excited-state lifetime to be on the order of 130−140 fs. 46,51,55,58Generally, the quantum yield of HT has been reported to lie in the range of 40−60%. 44,51,54,59.2.Feature Selection.In high-dimensional problems, such as the trajectories reported here, one must identify a small sample of data features on which clustering can be performed.This is largely a result of the curse of dimensionality, with the volume spanned by the data increasing exponentially with the number of features considered.Furthermore, inclusion of a number of similar and highly correlated features may bias the clustering result and lead to problems with interdependencies.If features that are irrelevant to the key trends in the data are included, overfitting may also occur, preventing the algorithm from identifying the underlying structure of the data.
The selection of a suitable feature set for time-dependent data presents additional challenges due to temporal correlations between features.Fortunately, a wide range of methods for feature selection exist based on chi-squared tests, variance thresholding, covariance analysis, regularization, or decision trees. 36,65−69 Note that while PCA is commonly employed in clustering analysis, the selected principal components are often not suitable for describing the dynamic evolution of a system across all time steps.
In this work, we employ a variance thresholding technique in which a minimal number of high variance features are selected for clustering.The benefit of this approach lies in its simplicity.While trajectory data that models photoexcited dynamics often includes a large number of coupled degrees of freedom, many of these may be redundant, and quite often key nuclear coordinates can be identified by the inspection of their variance over time.In the present case, plain internal coordinates are used as the basis, although we note that in more complicated cases, alternative representations, such as curvilinear coordinates, may provide better clustering results.In the case of CHD, we select a total of four high variance internal coordinates, as shown in Figure 3. Dihedral angles ϕ a and ϕ c are crucial in describing the isomerization of cZc-HT to tZt-HT on the ground state, whereas dihedral angle ϕ b describes the out-of-plane torsion exhibited by the molecule along the near flat 2A electronic state at later times.As the three dihedral angles do not allow the distinction between cZc-HT from vibrationally hot CHD, we also include the C 1 −C 6 bond.

Choice of Methodology.
A number of approaches are available to tackle the problem of identifying the temporal trends in time-dependent data.The appropriate choice depends on the characteristics of the data in question.Typically, data, such as stock prices, 15 temperature measurements, 70 and some specific types of medical data (e.g.electrocardiograms 71 and MRI observations 72 ), are modeled as time series consisting of observations at regular time intervals. 73,74In contrast, trajectory data is usefully considered as a group of moving objects, such as vehicles, 12,75 people, 76 or animals, 13 with each trajectory describing a particular evolution of associated spatial coordinates over time. 77,78onsidering the trajectories as a group of moving objects, the spatiotemporal approach to clustering involves applying a clustering algorithm to each independent time step of the trajectories, generating a database of independent spatial clusters at each time. 77This is represented schematically in Figure 4, which shows a snapshot of four trajectories at three different times.At each time, the trajectories clustered together are contained within the gray ellipse.The temporal patterns are then mined from the resulting database of clusters by identifying the groups of trajectories that remain clustered across time.For example, in Figure 4, there is a clear spatiotemporal correlation between trajectories [o 1 , o 2 , o 4 ] across all three times.More generally, there will exist a multitude of possible definitions of spatiotemporal clusters.Thus, the mining of temporal patterns will depend on the constraints provided for the chosen algorithm.Moving clusters, for instance, define a spatiotemporal cluster as a sequence of spatial clusters that meet a minimum overlap threshold over a specified number of consecutive time steps, whereas convoys are defined by the minimum number of trajectories continually clustered together over time.However, such approaches toward mining spatiotemporal patterns are often challenging  and can be computationally demanding given that one must cluster each time step independently first.
An alternative approach is to consider the trajectory data as a multivariate time series, in which each coordinate is represented by its own sequence (time series). 73Although dependent on the chosen similarity measure and overall length of the time series, this approach is generally simpler and faster than the spatiotemporal treatment of trajectory data.One can, for example, construct a pairwise distance matrix, calculated across all time steps, that can be used as input for the clustering algorithms.Moreover, certain distance (similarity) measures can account for small local variations in speed or temporal offset between trajectories, 79 as discussed in the following subsection.
With the aim of this paper being to demonstrate an approach for identifying concentrations of reaction flux that is both straightforward and that can be performed at low computational expense, we focus on the time-series treatment of trajectory data and mark the more involved spatiotemporal approach as something that could be considered in future work.We also note that although deep-learning approaches for identifying temporal patterns in trajectory data exist, 80 we focus on unsupervised algorithms that do not require a complicated training phase and thus can be quickly deployed in an almost automatic fashion to aid the interpretation of simulation and experimental observables. 81,82.4.Computational Details.Prior to clustering, the selected features must be normalized since their numerical magnitude may vary significantly.For example, in our application, the dihedral angles span a significantly larger range of values than the C 1 −C 6 bond length.The normalization thus limits the risk that features which span a larger numerical range dominate the clustering results.Here, we utilize the so-called min−max normalization, in which we map each feature to the range [0, 1] as where o if refers to the trajectory o i along the feature dimension f and o if ∈ O f , where O f is the set of all trajectories across feature f.With the normalized data, we proceed to construct a pairwise distance matrix, where each element corresponds to a single scalar value that reflects the similarity of the corresponding pair of trajectories, considering all features included (in the current example four features).The choice of distance and similarity measure depends on the data.In this paper, we consider the multidimensional generalization of the L p norm and multidimensional dynamic time warping (MD-DTW). 79,83Starting with the L p norm, this considers the distance between each sequential point in the time series and involves a 1:1 mapping of time points, as demonstrated in Figure 5a.The multidimensional generalization of the L p norm for two trajectories o i and o j is defined as with N t the total number of time steps and N f the total numbers of features.In the following, we exclusively use p = 2, i.e., the L 2 norm which is the generalization of the Euclidean distance.The advantage of this norm is its conceptual and computational simplicity.A drawback is that the L p norms require that both trajectories are the same length in time and therefore might not be applicable to trajectory data where the duration of trajectories varies (for instance, if trajectories are not propagated beyond fragmentation).Moreover, L p norms have been shown to underperform on high-dimensional data. 84n addition, the 1:1 mapping of time points is insufficiently flexible to account for small variations on the time axis.For example, two trajectories may be highly similar apart from a slight offset in time or exhibit a small local variation in frequency.In some cases, this may result in the misclassification of trajectories as the full extent of their similarity is not captured.Some of these shortcomings can be overcome by dynamic time warping (DTW), 85,86 which is the second similarity measure that we evaluate.This was originally developed for applications in speech recognition. 85−89 In brief, DTW allows for a degree of stretching or contraction on the time axis.This enables 1:many mapping of temporal indices, as illustrated in Figure 5b, meaning one can compare two time series that are similar but have local variations in frequency and temporal offsets.Furthermore, this makes it possible to compare time-series data of different lengths.Note that depending on the implementation, DTW may be fully frequency invariant, which could lead to erroneous matching of different vibrational frequencies.However, it is possible to constrict the degree to which this happens.As such, we favor limiting the amount of frequency invariance and primarily suggest the use of DTW to account for small temporal offsets between similar trajectories, as outlined in the Supporting Information.
The mapping of temporal indices is given by the warping path, which describes how a pair of trajectories are warped in time so that they are optimally aligned.The warping path is calculated by tracing through a cost matrix, which defines all the possible ways the two time series can be warped in time.Further details on how the warping path is calculated from the cost matrix and the dynamic programming procedure can be found in the Supporting Information.Once the warping path is calculated, the total cost of alignment is given by the sum of all its elements, which serves as the measure of distance between two trajectories.While a plethora of clustering algorithms exist, including agglomerative-, 90 centroid-, 91 and probabilistic 92 -based methods, we choose to focus on density-based approaches. 93enerally, these aim to assign data points to clusters based on the local density of points within a given volume.The definition of density varies between methods.In the case of DBSCAN, 94 which is the method explored in this paper, core regions of density are built up by identifying regions where a minimum number of data points (given by the parameter MinPts) lie within a given radius (ϵ) of each other.We focus on DBSCAN due to its ability to detect nonconvex clusters without requiring the user to specify a predefined number clusters.Furthermore, DBSCAN inherently includes outlier detection, marking data points that do not lie in sufficiently dense regions of the feature space, defined by MinPts and ϵ, as noise.This makes it less sensitive to outlier data points that do not correspond to the main trends in the dataset.An additional benefit of the DBSCAN algorithm is that it can be easily modified to accept precomputed pairwise distance matrices as input.Constructing a distance matrix using both the multidimensional L 2 norm and MD-DTW, the whole time series can be clustered globally over all time.A more detailed discussion of the DBSCAN algorithm and the types of clusters that may be identified is provided in the Supporting Information.
Having run the DBSCAN algorithm, one obtains N C clusters, with C I the number of members in cluster I ∈ N C .The validity of the clustering can be assessed by considering the number of trajectories classified as noise N noise a and the silhouette coefficient, which provides a statistical measure of how well a trajectory is assigned to its cluster.The silhouette coefficient S i for trajectory o i in cluster I is defined as 95 This has two components.The a i term is the mean distance between trajectory o i and all other trajectories within its cluster where |C I | is the number of members in cluster I.A small value of a i therefore means that trajectory o i is well assigned to cluster I.The b i in eq 3, on the other hand, is the mean distance of trajectory o i to the trajectories in its nearestneighbor cluster J, where the min{} function is used to identify the nearest neighbor among all clusters J ∈ N C .A large value of b i indicates that trajectory i is distinct from the trajectories in the nearestneighbor cluster.Together a i and b i define the silhouette coefficient S i , whose values are in the range S i ∈ [−1, 1].The closer S i is 1, the more appropriately the trajectory is clustered.However, if S i < 0, then trajectory o i would be more appropriately assigned to its nearest-neighboring cluster.The average value of the silhouette coefficient over all clustered data points (S avg ) provides a global measure that can be used to assess the validity of the clustering.

RESULTS AND DISCUSSION
We shall now turn to a discussion of the results obtained by the clustering of the data set consisting of CHD trajectories presented in Section 2.1 using the four features as shown in Figure 3 (i.e., the three dihedral angles ϕ a , ϕ b , and ϕ c and the R C1−C6 bond length).The similarity of every trajectory pair is calculated using both the L 2 norm and MD-DTW as discussed in Section 2.4.The resulting distance matrix is then provided as a precalculated input into the DBSCAN algorithm.In each case, we scan a suitable range of possible ϵ and MinPts values and analyze the clustering results.
3.1.Multidimensional L 2 Norm.With the L 2 norm as a similarity measure, the parameters ϵ and MinPts are scanned in the ranges ϵ ∈ [1, 1.5] and MinPts ∈ [3, 7], respectively, repeating the clustering for each pair of parameters.The results are presented in Figure 6, which shows the number of clusters N C , the average value of the silhouette coefficient S avg , and the number of trajectories marked as noise N noise .Good clustering is reflected in large values of the average silhouette coefficient S avg .Generally, the highest values of S avg result in either N C = 3 or N C = 4 clusters being identified.Note, for many combinations of ϵ and MinPts, there is little variation in the value of S avg , making the identification of the best set of parameters, and thus also the ideal number of clusters, somewhat ambiguous.The best combinations of parameter values are indicated by the circled points in the heatmaps, with the corresponding values given in the table in Figure 6.
Overall, the region corresponding to ϵ > 1.3 and MinPts < 6 yields the smallest values of S avg .For these parameter values, the definition of the density neighborhood is too diffuse, and trajectories are grouped into only N C ≤ 2 broad clusters.Increasing the value of MinPts to ≥6 raises the density requirement, enabling N C = 3 clusters to be identified.In this range, the best results are given by (ϵ, MinPts) = (1.35,7), circled in orange in the heatmaps in Figure 6.As seen in the table in Figure 6, these parameters yield N noise = 5 and S avg = 0.386.Moreover, the intermediate range of ϵ, where 1.175 ≤ ϵ ≤ 1.3, yields N C = 3 with the highest value of S avg across all values of MinPts, indicating optimal cluster separability.However the number of trajectories marked as noise increases above 10 for high values of MinPts.The best N C = 3 case is given by (ϵ, MinPts) = (1.2, 4) and is circled by a blue oval in each panel in Figure 6.In this case, only seven trajectories are marked as noise, with a slightly higher value of S avg = 0.389 compared to (ϵ, MinPts) = (1.35,7).Finally, when ϵ ≤ 1.15, the density neighborhood is sufficiently small to identify N C = 4 clusters, but with as many as 15−20 trajectories marked as noise.However, picking a lower value of MinPts = 3 yields reasonable levels of noise.For this particular case, (ϵ, MinPts) = (1.15,3), N C = 4 clusters are identified with only seven noise trajectories (green oval in Figure 6).This yields S avg = 0.37, which is slightly lower than the N C = 3 region.
To better understand the clustering results, we turn to Figure 7, which shows the values of the individual silhouette coefficients S i for each trajectory in its respective cluster C i .Looking first at the (ϵ, MinPts) = (1.15, 3) case in the lefthand panel, we notice that the S i values of 1 , C 3 , and C 4 are all above the average of S avg = 0.37, indicating that the trajectories therein are well clustered.However, the broad shaded area corresponding to C 2 indicates the existence of a large cluster that contains some trajectories for which S i < S avg .In addition, C 2 contains a number of trajectories for which S i < 0, meaning that they are unfavorably clustered and would in principle be better assigned to the nearest-neighboring cluster instead.Turning to the (ϵ, MinPts) = (1.2, 4) case in the right panel of Figure 7, we see that cluster C 4 is absorbed by cluster C 2 .However, the number of trajectories with S i < 0 is reduced.This, and the slight increase in S svg from 0.37 to 0.389, indicates that (ϵ, MinPts) = (1.2, 4) results in a marginally improved clustering.The reaction products identified by the center of each cluster at the final time are shown in Figure 8.For the N C = 3 case, the products correspond to vibrationally hot CHD (C 1 ), a set of structures with large torsional angles (C 2 ), and tZt-HT (C 3 ), which is formed via isomerization of cZc-HT.The N C = 4 case results in the identification of an additional product (C 4 ), corresponding to cEc-HT.
Having identified the N C = 3 case with (ϵ, MinPts) = (1.2, 4) as the statistically better result, although marginally so, we now turn to discuss the distribution and evolution of the clusters in time.The resulting clusters C 1 through to C 3 can be seen in Figure 9, where each panel refers to one of the four features on which clustering was performed.Before approximately 40 fs, all clusters exhibit a similar evolution, with the CHD ring structure strained upon excitation.The strained geometry is indicated by the increase in the dihedral angles ϕ a and ϕ c , as well as the C 1 −C 6 bond length.The straining of the ring system continues, with oscillations in ϕ a and ϕ c and a simultaneous sharp increase in the dihedral ϕ b .Following this, the cluster centers begin to separate as the trajectories disperse, evolving along different reaction pathways.Cluster C 1 returns to vibrationally hot CHD after several oscillations in the C 1 − C 6 distance, which dampen in time.Turning to cluster C 3 , we observe that the C 1 −C 6 bond length increases at a faster rate than the other clusters.This occurs as cZc-HT forms on the ground state.For times exceeding 100 fs, C 3 begins to increase in both the ϕ a and ϕ c dihedral angles, commensurate with the isomerization of cZc-HT to tZt-HT.Characterizing the pathway for cluster C 2 is less clear-cut, with the cluster exhibiting significant broadening in all four features.This broadening is most significant for the C 1 −C 6 distance and ϕ b dihedral, which span almost the whole range of available distances and angles.This is likely the result of a very disperse wave packet limiting the separability of the data, at least based on nuclear geometries alone.However, we do note that the cluster is centered on the values of ϕ b around 100°and ϕ a /ϕ c around zero degrees.Thus, the densest component of C 2  Turning to the N C = 4 case with (ϵ, MinPts) = (1.15,3), plotted in Figure 10, we observe the same general trends as shown in Figure 9.However, we see that some trajectories are redistributed from the broad C 2 cluster to a new cluster C 4 .While the dihedral angles ϕ b still span a large range, this is somewhat reduced.Recall that here cluster C 2 contains a greater number of trajectories with negative values of the silhouette coefficient S i , as shown in Figure 7.It is, therefore, likely that the very broad nature of the wave packet here results in clusters that have a degree of overlap and that some trajectories within C 2 are misclassified.The additional cluster C 4 observes very small values of the dihedral angles ϕ a /ϕ c .In addition, the sharper increase in the C 1 −C 6 bond length, and the fact the dihedral angle ϕ b exceeds 200°, suggests this cluster is commensurate with the cEc-HT product channel.Here, we note that both ϕ a and ϕ c exhibit strong oscillations before the central dihedral ϕ b rapidly increases, suggesting that the initial ring strain transfers energy to the dihedral ϕ b .
We now shift our focus to the electronic-state populations for each cluster.The populations are not included in the feature set in the current clustering process and provide additional insight into the nature of each identified cluster.In Figure 11 we see the populations of the S 1 and S 0 state plotted for every trajectory in each cluster at 150 fs.The dashed line x = y is drawn indicate that trajectories below and above this line are mostly S 0 and S 1 in character, respectively.Moreover, the left and right hand panels show the N C = 4 and N C = 3 clustering results, respectively.In both panels, we see that clusters C 1 and C 3 are mostly S 0 in character, confirming that they correspond to the return of the wave packet to groundstate CHD and the ground-state HT product channel that isomerizes from cZc-HT to cZt-and tZt-HT.However, the character of C 2 is again less clear.While most trajectories are dominated by S 1 population, some trajectories have higher S 0 population.This supports the idea that the densest part of C 2 defines the part of the wave packet that remains trapped on the excited state, dispersing into a broader range of ϕ b angles due to the low gradients on the PES.It also suggests that some of the trajectories are clustered unfavorably, a fact also supported by the negative S i values.In the left-hand panel, we see that in the N C = 4 case, some of the C 2 trajectories on the ground state are relabeled as C 4 , corresponding to cEc-HT.

Multidimensional Dynamic Time Warping.
To improve on the issue with some misclassified C 2 trajectories when using the L 2 norm, we turn to MD-DTW as a more flexible similarity measure.Given that MD-DTW can account for small variations in offsets between trajectories and small local changes in frequency, it has the potential to provide a   more accurate measure of similarity between the temporal series, thus improving separability compared to when the L 2 norm is used.
Here, the parameters ϵ and MinPts are scanned in the ranges ϵ ∈ [6.5, 8] and MinPts ∈ [3, 7], respectively.The result of the clustering is shown in Figure 12  Turning to Figure 13, which shows for the best N C = 3 and 4 cases the silhouette coefficient S i for each trajectory in its respective cluster, we see that the C 4 cluster is significantly larger in comparison with the N C = 4 case in Figure 7. Here, additional trajectories are acquired from the C 2 cluster.Furthermore, C 2 appears to have a smaller number of trajectories with negative S i values.For the set of parameters (ϵ, MinPts) = (7.65,6), which has the highest S avg , the C 4 cluster is again absorbed by the C 2 cluster.Note that apart from a small number of trajectories, the N C = 3 case is practically indistinguishable from the corresponding N C = 3 results identified by the L 2 norm.However, the N C = 4 case, as identified by MD-DTW, is unique.We shall, therefore, focus the rest of our discussion on this result, noting that the C 4 cluster is always absorbed by the C 2 cluster in the set of N C = 3 results.
The plot of the features for the center of each cluster in the N C = 4 case is shown in Figure 14.The    these clusters difficult.Based on nuclear geometries alone, it is not immediately clear if additional trajectories in C 4 correspond to ground-state cEc-HT.Without additional information, the separation of the two groups of trajectories, corresponding to ground-state cEc-HT and the remainder of the excited wave packet with larger torsional angles, may not be possible.Analogously to the L 2 norm as shown in Figure 9, the N C = 3 case appears by clusters C 2 and C 4 in Figure 14 merging to form a very broad and large cluster C 2 .
Turning to Figure 15, which shows the final-time (t = 150 fs) distribution of populations observed within each cluster, we see that the N C = 3 case (right panel) is nearly identical to the L 2 norm results as shown in Figure 11.In the left panel, we see the N C = 4 result that is unique to MD-DTW.Here, we see a number of additional C 4 trajectories with a population above the x = y line.Thus, the additional trajectories contained within the C 4 cluster are dominated by the S 1 state amplitude.These trajectories belong to the remainder of the excited wave packet that exhibits a larger degree of torsion and not cEc-HT formation on the ground state.However, we note that these trajectories have a population that is nearly equal for the S 0 and S 1 states.Were the simulations to run longer, these may completely decay to the ground state and yield the cEc-HT product.Moreover, we see that while the C 2 cluster is composed mostly of trajectories with dominant S 1 character, it still contains a number of trajectories with dominant S 0 character.

CONCLUSIONS
The approach demonstrated in this paper is general and applicable to any trajectory-based simulation method.It is easy to implement, computationally efficient, and provides a useful tool that can be deployed with little additional effort for quick identification of key reaction and product channels.In addition, once the clusters have been obtained, a small set of representative trajectories can be identified that can be used for, e.g., high-level calculations of observables.
Notably, the clustering is applied to the entire basis, and the weights of the individual trajectories are not considered at the clustering stage.While the size of each resulting cluster gives some indication of the significance of the corresponding reactive pathway, in the case of e.g. the AIMCE trajectories considered in this paper, a more accurate measure would involve integration over the complex expansion coefficients of the trajectories, as outlined in the Supporting Information.
Generally, the clusters identified have a degree of overlap and vary in both size and density.In part, the difficulty in achieving good separation of the data relates to the limited ability of DBSCAN to identify clusters of varying density, a known issue, and that we have applied the algorithm to a test system in which the wave packet at later times is quite disperse.Nevertheless, given the nature of the trajectory data and that the time-series approach requires an algorithm capable of utilizing precomputed distance matrices, DBSCAN is the best choice.Other algorithms that accept precomputed distance matrices, such as k-means or hierarchical clustering, are generally not suitable and in testing we found them to yield noninterpretable and nonsensical results, although we note that this observation may to some degree be systemdependent.
The time-series methodology does have drawbacks.First, the similarity measures are global over time, which means that smaller amplitude motions at earlier times can wash out in systems that exhibit large amplitude motion at later times, such as CHD.The clustering results are, in those cases, biased toward later times, where the system has settled into a distribution of different products.This is likely the cause of the similar results obtained with both the multidimensional L 2 norm and the MD-DTW.As such, the potential benefit of using MD-DTW does not outweigh the additional computation expense, and we favor the use of the more simplistic L 2 norm in systems with large amplitude motion.For both similarity measures, we see that based on nuclear geometries alone, the time-series approach can struggle to separate trajectories contained in a broad distribution of dihedral angles.Although MD-DTW achieves better separation, it still misclassifies some trajectories.On a positive note, both metrics succeed in identifying four separate reaction channels and provide basic insight into the decay pathways available to the excited system.
Second, consider the scenario where a large swarm of trajectories decay through a CoIn, leaving a number of trajectories behind that undergo several additional oscillations on the excited state before decaying.The straggling trajectories slowly leak through the CoIn, before converging on the product.Given that the separation of trajectories in the timeseries approach depends largely on the time at which a significant number of trajectories assume a different geometry, it is only able to capture the large swarm trajectories that decay over a similar time period.The use of MD-DTW may compensate for this to a greater extent than the L 2 norm due to its ability to account for offsets in time.However, in our current example, the choice of similarity metric had little effect on the clustering result.Finally, for very long simulations, the use of MD-DTW is not practical due to its unfavorable scaling with the length of the temporal series, although restrictions may be placed on the number of warping paths considered, thus increasing efficiency.The multidimensional L 2 norm is also limited by its poor performance in high-dimensional spaces.
In the application to CHD, we demonstrate that a suitable set of four features can be selected from the internal coordinate based on their high variance.In systems that exhibit a wider and more complex range of nuclear rearrangements, more features may have to be included in the clustering.Generally, increasing the number of features also increases the sampling requirement, and therefore, the results have a dependence on the number of trajectories and the underlying statistical distribution of the data.If the sampling of phase space is insufficient, there is a risk that minor pathways will be marked as outliers.Thus, sufficient sampling of the phase space is just as important for accurate clustering, as it is for overall accurate propagation of the dynamics.In terms of features, in some cases, the rearrangement of the nuclear geometry may be sufficiently complex to warrant alternative representations of the nuclear motion beyond internal coordinates.This could include constructing coordinates or displacement vectors that capture the coupled simultaneous motion of several atoms.Future work should, therefore, address the limits of internal coordinates and the construction of alternative coordinates.Furthermore, the use of more sophisticated approaches that construct reduced dimensionality representations of the dynamics poses an interesting avenue for further exploration.For instance, one may envision the use of autoencoders to encode the time-dependent trends of the dynamics in a reduced latent space.
While the current work allows for a rapid first-order identification of different reaction channels available, it cannot describe more intricate details of the nuclear and electronic dynamics observed.Future work to be undertaken should, therefore, focus on methods that provide more in-depth insights into complex trajectory datasets, likely by exploring the applicability of the more demanding spatiotemporal treatment described in Section 2.3.Due to the large overlap of trajectories within the Franck−Condon region at early times and the subsequent emergence of different reaction pathways with variable probability density, the difficulty lies in achieving good clustering over all times without having to reparameterize the chosen clustering algorithm at each independent time step.Should such an approach be successfully realized, this may be capable of not only identifying the different reaction pathways but also the times at which the trajectories diverge.Furthermore, one could envision the inclusion of additional information in the clustering process, such as momenta or electronic populations, allowing for improved refinement of the key trends observed and providing insight into the changing electronic character during the evolution of the system.
In summary, we have demonstrated an automatic and unbiased computational procedure for harvesting physical and chemical insights from large sets of high-dimensional and highly complex trajectories.We have shown that clustering using the unsupervised density-based DBSCAN algorithm provides a computationally efficient method for classifying trajectories, allowing the main trends in the probability flux to be rapidly identified at low computational cost.This methodology slots naturally into the standard workflow for simulations of photoexcited dynamics that employ trajectorybased (quantum) molecular dynamics methods.Compared to deep-learning approaches, this method requires no complicated training phase and can be deployed in an almost automatic fashion to aid the interpretation of simulations and experimental observables.
Background on the AIMCE simulations used to calculate the trajectories, an overview of types of clusters and clustering algorithms, and further details on the identification or reactive pathways and dynamic time warping (PDF) ■

Figure 1 .
Figure 1.Photoinduced electrocyclic ring-opening of CHD to HT.The diabatic electronic states are shown schematically, with red points indicating conical intersections.Following formation of cZc-HT on the ground state, several cis/trans HT isomers are accessible.

Figure 2 .
Figure 2. Net populations | | a t ( ) k 2 calculated as an average over the full set of N trj = 100 AIMCE trajectories for each of the k = 0−2 electronic states as a function of time.At early times, the S 1 state corresponds to the diabatic 1B state.At 30 fs, a maxima in population of the S 2 state is observed as the wave packet passes through the 2A/ 1B intersection, at subsequent times, the S 1 state is of 2A character.

Figure 3 .
Figure 3. High variance internal coordinates that form the basis of the feature space on which clustering is performed.The selection includes the dihedral angles ϕ a , ϕ b , and ϕ c , as well as the bond length R C1−C6 .

Figure 4 .
Figure 4. Schematic representation of the mining of spatiotemporal clusters.The evolution of four trajectories, [o 1 , o 2 , o 3 , o 4 ], in a twodimensional feature space is shown at three times, t 1 , t 2 , and t 3 , each represented by a slice.The gray ellipses mark the spatial clusters identified by application of a spatial clustering algorithm at each time independently.

Figure 5 .
Figure 5. Mapping of two time series, shown in blue and orange.(a) Direct 1:1 mapping, as in the L 2 norm, (b) DTW allows 1:many mapping, allowing similar sequences to be mapped onto each other even when not perfectly aligned.In panel (b), the different possible mappings of temporal indices are indicated by their color, with yellow representing a 1:1 mapping of equivalent temporal indices, magenta a 1:many mapping, and green the warping of the temporal axis through a 1:1 mapping between indices at different time steps.

Figure 6 .
Figure 6.Results of clustering using the L 2 norm as a measure of trajectory similarity.The heatmaps shown correspond to (top left) the average silhouette coefficient S avg , (top right) the number of resulting clusters N C , and (bottom left) the total number of trajectories marked as noise N noise .Note that the range of the heatmap for S avg does not reflect the full [−1, 1] range available to S avg in order to visualize the variations correctly.The table (bottom right) shows the values of N C , N noise , and S avg for the good clustering results with N C = 3, 4, encircled by correspondingly colored ellipses in the heatmaps.

Figure 7 .
Figure 7. Values of the silhouette coefficients (S i ) for each trajectory in each cluster C I , with the whole area from zero to S i filled by the color of each cluster.The left panel shows results for (ϵ, MinPts) = (1.15, 3) and the right for (ϵ, MinPts) = (1.2, 4).The height of the shaded area reflects the size of each cluster.The two sets of parameters for which results are shown correspond to the best results for N C = 3 and 4, in each case with seven trajectories excluded as noise.The red dashed lines indicate the average silhouette coefficient, S avg .

Figure 8 .
Figure 8. Set of four product channels identified by clustering.Shown are the final geometries at the center of each cluster.C 1 corresponds to the return of the wave packet to hot CHD, C 2 to the part of the wave packet that remains on the excited state and exhibits larger torsional angles, and C 3 to the ground-state tZc/tZt HT product.Finally, C 4 corresponds to a cEc-HT product.

Figure 9 .
Figure 9. Plots of the cluster centers in each of the four selected features for ϵ = 1.2 and MinPts = 4 (using the L 2 norm).The top left and right panels show the C 1 −C 6 bond and the central dihedral angle ϕ b , whereas the bottom left and right panels show the dihedral angles ϕ a and ϕ c .The shaded areas represent the size of the cluster.

Figure 10 .
Figure 10.Plots of the cluster centers in each of the four selected features for ϵ = 1.15 and MinPts = 3 (using the L 2 norm).The top left and right panels show the C 1 −C 6 bond and the central dihedral angle ϕ b , whereas the bottom left and right panels show the dihedral angles ϕ a and ϕ c .The shaded areas represent the size of the cluster.
in terms of N C , N noise , and S avg .Generally, performing DBSCAN clustering using the MD-DTW similarity measure yields similar results to using the L 2 norm, with either three or four clusters identified.The highest values of S avg are observed as MinPts increases and ϵ lies in the range 7.6 ≤ ϵ ≤ 8.This results in N C = 3 clusters.The highest value of S avg = 0.396 corresponds to (ϵ, MinPts) = (7.65,6), indicated by the purple ellipse in each heatmap.The intermediate range where ≤ ϵ ≤ 7.6 consistently results in N C = 3 clusters.The higher values of MinPts result in an excessive number of trajectories marked as noise.However, the values of ϵ ≤ 7.7 and MinPts ≤ 6 consistently result in N C = 4 clusters.The number of trajectories excluded as noise increases with MinPts.In addition, the value of S avg decreases slightly in comparison with the N C = 3 results.The set of values (ϵ, MinPts) = (6.75, 3) results in N C = 4 clusters with lower noise levels and a reasonable value of S avg = 0.332, and is indicated by the green ellipse in the heatmaps.
C 4 cluster now includes a broader range of dihedral angles ϕ b in comparison with the L 2 norm case in Figure 10.At the end of time, this cluster is centered at a slightly lower value of ϕ b (170°).However, it still contains trajectories that exceed angles ϕ b > 200°.Also note that the trajectories with lower values of ϕ b come from the C 2 cluster, which is now slightly smaller than in the L 2 norm case.The broad range of dihedral angles ϕ b makes separation of

Figure 12 .
Figure 12. Results of clustering when using DTW as a measure of trajectory similarity.The scale of the S avg heatmap is chosen to highlight the differences in otherwise small variations.The table shows data for the two best clustering results with N C = 3 and 4, highlighted by colored ellipses in the heatmaps.The corresponding figure for the L 2 norm is Figure 6.

Figure 13 .
Figure 13.Silhouette coefficient (S i ) for each trajectory in its respective cluster C i (using MD-DTW).The left-hand panel shows the best N C = 4 set of parameters, and the right-hand panel, the best N C = 3 case.In each case, the red dashed line indicates the value of S avg .The corresponding figure for the L 2 norm is Figure 7.

Figure 14 .
Figure 14.Plots of the cluster centers in each of the four selected features for ϵ = 6.75 and MinPts = 3 (using MD-DTW).The top left and right panels show the C 1 −C 6 bond and the central dihedral angle ϕ b , whereas the bottom left and right panels show the dihedral angles ϕ a and ϕ c .The shaded areas represent the size of the cluster.

Figure 15 .
Figure 15.Distribution of S 0 and S 1 state populations at time t = 150 fs using MD-DTW.Points are color-coded according to their cluster assignment.The left panel shows the N C = 4 resulting from (ϵ, MinPts) = (6.75,3), and the right panel shows the N C = 3 case with (ϵ, MinPts) = (7.65,6).